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ABSTRACT 


Visual inspections of the first optical rest-frame images from JWST have indicated a surprisingly 
high fraction of disk galaxies at high redshifts. Here we alternatively apply self-supervised machine 
learning to explore the morphological diversity at z > 3. Our proposed data-driven representation 
scheme of galaxy morphologies, calibrated on mock images from the TNG50 simulation, is shown to be 
robust to noise and to correlate well with physical properties of the simulated galaxies, including their 
3D structure. We apply the method simultaneously to F200W and F356W galaxy images of a mass- 
complete sample (M. /Mc > 10°) at z > 3 from the first JWST/NIRCam CEERS data release. We 
find that the simulated and observed galaxies do not populate the same manifold in the representation 
space from contrastive learning, partly because the observed galaxies tend to be more compact and 
more elongated than the simulated galaxies. We also find that about half the galaxies that were 
visually classified as disks based on their elongated images actually populate a similar region of the 
representation space than spheroids, which according to the TNG50 simulation is occupied by objects 
with low stellar specific angular momentum and non-oblate structure. This suggests that the disk 
fraction at z > 3 as evaluated by visual classification may be severely overestimated by misclassifying 
compact, elongated galaxies as disks. Deeper imaging and/or spectroscopic follow-ups as well as 
comparisons with other simulations will help to unambiguously determine the true nature of these 


galaxies. 


Keywords: Galaxy formation (595); Galaxy evolution (594); High-redshift galaxies (734); Neural 


networks (1933); 


1. INTRODUCTION 


Understanding how galaxy diversity emerges across 
cosmic time is one of the major goals of galaxy forma- 
tion. How and when do stellar disks form? What are the 
main drivers of bulge growth? How and when did galaxy 
morphology and star formation get connected? Despite 
significant progress in the past years, thanks in partic- 
ular to deep surveys undertaken with the Hubble Space 
Telescope (e.g., Scoville et al. 2007; Grogin et al. 2011a; 
Koekemoer et al. 2011), these questions remain largely 
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unanswered. The general picture is that massive star- 
forming galaxies in the past were more irregular in their 
stellar structure (e.g., Abraham et al. 1996; Conselice 
2003) than today’s disks even if observed in the opti- 
cal rest-frame (Buitrago et al. 2013; Huertas-Company 
et al. 2015). Galaxies above z ~ 1 also show the pres- 
ence of giant star-forming clumps (e.g, Guo et al. 2015, 
2018; Huertas-Company et al. 2020; Ginzburg et al. 
2021) which might indicate a turbulent and unstable 
ISM (e.g., Ceverino et al. 2010; Bournaud et al. 2014). 
Although the gas shows signatures of rotation at z 2 
(e.g., Wisnioski et al. 2015), the settling of disks seems to 
be a process happening at least from z ~ 2 (e.g., Kassin 
et al. 2012; Buitrago et al. 2014; Simons et al. 2017; 
Costantin et al. 2022a) coincident with the decrease of 
gas fractions in massive galaxies (e.g., Freundlich et al. 
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2019; Genzel et al. 2010). Another important result of 
the past years is that the presence of bulges in galaxies 
is strongly correlated with the star formation activity at 
all redshifts probed (e.g., van der Wel et al. 2014a; Barro 
et al. 2017; Costantin et al. 2020, 2021; Dimauro et al. 
2022). This suggests that bulge formation and quench- 
ing are tightly connected physical processes (e.g., Chen 
et al. 2020b). 

With its unprecedented sensitivity, spatial resolution 
and infrared coverage, the JWST is offering a new win- 
dow to the stellar structure of galaxies in the first epochs 
of cosmic history (Gardner et al. 2006). For the first 
time, we are able to explore the stellar morphologies of 
the first galaxies formed in the universe, which should 
enable new constraints on the physical processes gov- 
erning galaxy assembly at early times and hopefully a 
better understanding of the physical processes leading 
to the formation of the first stellar disks and bulges. 
Some very recent works have already started this ex- 
ploration by performing visual classifications (Ferreira 
et al. 2022a,b; Kartaltepe et al. 2022) or by applying 
supervised Machine Learning trained on HST images 
(Robertson et al. 2023) of galaxies observed in the first 
JWST deep fields. One of the main results of these early 
works is that JWST seems to be detecting star-forming 
disks even at z > 3, which would push the time of disk 
formation to very early epochs. Two questions naturally 
arise from these first works: 


e Are the galaxies seen by JWST true disks, i.e 
flat, rotating systems? The aforementioned re- 
sults are based primarily on qualitative morpho- 
logical classifications, with quantitative tracers of 
morphology (e.g., Sersic fits) incorporated to fur- 
ther inform differences between the visually de- 
fined classes. However, galaxies might look mor- 
phologically disky but have significantly different 
stellar kinematics than local disk galaxies. Dis- 
tinguishing edge-on flat disks from more prolate 
systems is also a very challenging task that could 
bias the results (e.g. van der Wel et al. 2014b; 
Zhang et al. 2019). 


e Do modern cosmological simulations reproduce 
the observed galaxy diversity at z > 3? Although 
some preliminary comparisons exist, a fair com- 
parison in the observational plane is required to 
fully address this question (e.g., Rodriguez-Gomez 
et al. 2019; Huertas-Company et al. 2019; Zanisi 
et al. 2021). 


In this work, we attempt to provide new insights into 
these two main questions. To that purpose, we ap- 
ply a novel data-driven approach based on contrastive 


learning (Hayat et al. 2021; Sarmiento et al. 2021) to 
a mass complete sample of galaxies at z > 3 in the 
CEERS survey (Finkelstein et al. 2022a). By calibrat- 
ing the method with mock galaxies (Costantin et al. 
2022b) from the TNG50 cosmological simulation (Nel- 
son et al. 2019a; Pillepich et al. 2019; Nelson et al. 
2019b) and by choosing the proper augmentations (i.e., 
transformations applied to the images such as rotations, 
flux normalizations, noise, etc.), we are able to build 
a morphological description which is more robust to 
noise and galaxy orientation than more traditional ap- 
proaches. Our morphological representation can then 
be correlated with the physical properties of galaxies 
from the simulation to provide new insights about the 
physical nature of visually classified disks and to explore 
the agreements and disagreements between observations 
and simulations. 

The paper proceeds as follows: in section 2 we describe 
the galaxy datasets used in this work; section 3 describes 
the contrastive learning setting used to derive unsuper- 
vised representations of galaxy morphologies; section 4 
explores the properties of the obtained representations 
on observed JWST/CEERS galaxies; the results and im- 
plications are discussed in section 5; finally, a summary 
and the final conclusions are presented in section 6. 


2. DATA 
2.1. CEERS 


We use JWST imaging data from NIRCam obtained 
during the first epoch (June 21-22, 2022) of the Cosmic 
Evolution Early Release Science (CEERS; Finkelstein 
et al. 2017) survey. This consists of short and long- 
wavelength images in both NIRCam A and B modules, 
taken over four pointings, labeled NIRCam1, NIRCam2, 
NIRCam3, and NIRCam6. Each pointing was observed 
with seven filters: F115W, F150W, and F200W on the 
short-wavelength side, and F277W, F356W, F410M, and 
F444W on the long-wavelength side. Here we only use 
the F200W and F356W filters. These two filters probe 
the UV, optical and near-IR rest-frame at z > 3, allow- 
ing to probe simultaneously the distribution of young 
and old stars. A full description of this release and the 
data reduction can be found in Bagley et al. (2022) and 
Finkelstein et al. (2022b). 

In addition to the images, we use two different catalogs 
with physical properties of galaxies: 


e CEERS catalog (CEERS): a photometric catalog 
(Bagley et al. 2022; Finkelstein et al. 2022b) with 
derived stellar masses and photometric redshifts 
(Zphot) obtained through SED fitting of the lat- 
est data reduction photometry (Pablo G. Pérez- 
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Gonzalez private communication). For a fair com- 
parison with the simulated TNG50 dataset — 
see subsection 2.2— we select 891 galaxies with 
3 < z X 6 and stellar masses M, > 109Mo. We 
also impose the selected galaxies to have Kron- 
based fluxes in F200W and F356W filters above 
zero to avoid spurious sources. Following Pozzetti 
et al. (2010), we derived a completeness stellar 
mass of ~ 10" M, for the considered redshift range. 
'The subsample of the CEERS catalog used in this 
work is, therefore, mass-complete within 3 < z < 
6. 


e Visual classification catalog (VISUAL): a redshift- 
selected z > 3 morphological catalog presented 
in Kartaltepe et al. (2022) containing galaxies in 
common between CANDELS (Grogin et al. 2011b) 
and CEERS observations. This is intended to di- 
rectly compare our morphological description to 
the visual classification of Kartaltepe et al. (2022). 
Redshifts and stellar masses are extracted from 
CANDELS v2 for the HST F160W-selected galax- 
ies in the EGS field (see Kodra et al. 2022, for full 
details on the photometric redshift measurements 
and resulting catalogs). The morphological cata- 
log presented in Kartaltepe et al. (2022) consists 
of a redshift-selected sample with 850 galaxies at 
z 2 3. The visual classifications of each galaxy 
are performed by three people. A given clas- 
sification is assigned if two out of three people 
select that option. Galaxies classified in this way 
are broken down into the following non-exclusive 
morphological groups: Disk only, Disk+Spheroid, 
Disk--Irregular, Disk-4-Spheroid--Irregular, 
Spheroid only, Spheroid+Irregular and Irregu- 
lar only. See Kartaltepe et al. (2022) for more 
details on the different classification tasks and 
morphological groups. 


Both catalogs also include morphological measure- 
ments of Sérsic index (ne), semi-major axis (a) and axis- 
ratio (b/a) derived with galfit (Peng et al. 2010). More 
information about the fits can also be found in Kartal- 
tepe et al. (2022). 

The distributions of stellar masses of the galaxies in 
the CEERS and the VISUAL datasets are shown in Fig- 


ure 1. 


2.2. Mock JWST images of TNG50 galaxies 


We use the TNG50-1! suite of simulation (hereafter 
'TNG50; Nelson et al. 2019a; Pillepich et al. 2019; Nel- 


l https:/ /www.tng-project.org/ 
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Figure 1. Probability density function of the logarithm 
of the stellar mass of the simulated galaxies in the TNG50 
dataset (blue histogram), and the observed galaxies in the 
CEERS and VISUAL datasets (green and orange dashed his- 
togram, respectively). Different panels correspond to the 
redshifts analyzed, z = (3,4, 5,6). 


son et al. 2019b) and their mock NIRCam observations 
at z > 3 galaxies following the observational strategy 
of CEERS. The mock images? were produced by mod- 
eling the gas cells and star particles in the simulation 
as presented in Costantin et al. (2022b). We consider 
four snapshots of the TNG50 simulation correspond- 
ing to z = (3,4,5,6) and galaxies with stellar masses 
M, > 10°Mo. In total, the original dataset consists of 
1326 galaxies (see Table 1). Each selected galaxy is then 
observed along 20 different line-of-sight orientations to 
increase the statistics which produces a dataset of 26520 
galaxy images that we consider as independent objects 
for the purpose of this work. As described in Costantin 
et al. (2022b), parametric and non-parametric morpho- 


2 Data publicly released at https://www.tng-project.org/ 
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logical parameters for this dataset are derived using 
the standard configuration of statmorph?(Rodriguez- 
Gomez et al. 2019). 

For this work, we use the noiseless images in the 
F200W and F356W bands from the Costantin et al. 
(2022b) dataset with a pixel scale of 0.031 and 0.063 arc- 
sec pix !, respectively. The field-of-view of each image 
is equal to the total (dark matter, gas and star particles 
included) half-mass radius of the corresponding galaxy. 
Our image classification scheme requires a fixed image 
size. Therefore, we select galaxy images with a field-of- 
view larger than 64 x 64 and 32 x 32 pixels in the F200W 
and F356W bands, respectively, and generate cutouts of 
those sizes. Then, to match both observations of the 
same galaxy, images in the F356W band are re-sampled 
to the same pixel scale as the F200W images. According 
to these criteria, S; 7% of the galaxies (most of them at 
z = 3) are dropped out from our initial sample. The 
total number of galaxies considered is finally 1238 dis- 
tributed within z = 3— 6 (see Table 1), which translates 
into 24 760 projections. Although the number of objects 
we remove is small, we check in Figure 2 if a specific 
population is systematically removed. The figure shows 
the size-mass relation of the selected TNG50 dataset 
along with the excluded galaxies based on the size of 
the field-of-view. The excluded galaxies are not nec- 
essarily the most compact and/or less massive galaxies 
in the dataset. However, a fraction of them with lower- 
than-average stellar extent are indeed removed based on 
our selection and would reach otherwise sizes of a few 
hundreds of parsecs. 

For comparison, we show in Figure 1 the distribution 
of the stellar masses in the simulated TNG50 dataset, 
and the observed CEERS and VISUAL datasets. Note 
the good agreement between the TNG50 and the 
CEERS samples, even if we are comparing here the 
stellar masses directly extracted from the TNG50 sim- 
ulations and those obtained through SED fitting of the 
latest JWST data. Also remarkable is the agreement, 
despite selection effects, between the TNG50 and the 
VISUAL datasets. 


3. SELF-SUPERVISED LEARNING 
REPRESENTATION OF MOCK JWST IMAGES 
OF SIMULATED TNG50 GALAXIES 


In this section, we describe the main methodology we 
develop to obtain a data-driven morphological descrip- 
tion of galaxies that is robust to noise and other nuisance 
parameters. 


3 statmorph is available at https: //statmorph.readthedocs.io. 


Table 1. Summary of the sample of simulated galaxies from 
the TNG50 dataset. The first column indicates the redshift 
(z); the second column shows the total number of galaxies 
in the simulated TNG50 dataset; the third column refers 
to the number of selected galaxies according to image size 
limitations (i.e., 64 x 64 and 32 x 32 pixels in the F200W 
and F356W bands, respectively). 


z All galaxies Selected galaxies 


3-6 1326 1238 
3 829 760 
4 343 326 
5 115 113 
6 39 39 
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Figure 2. Logarithm of the physical size (stellar half-mass 
radius, rą, in kpc) versus the stellar mass (M., in Mc) of the 
TNG50 galaxies. Blue-filled contours show the 25%, 50% and 
75% probabilities of the TNG50 selected galaxies. Red data 
points correspond to the excluded galaxies in terms of the 
size of the field-of-view. See Table 1. 


3.1. Contrastive learning framework 


Our approach is based on an adaptation of the Sim- 
ple framework for Contrastive Learning of visual Rep- 
resentations (SimCLR; Chen et al. 2020a). Very briefly, 
the idea behind the SimCLR framework is to obtain ro- 
bust representations of images without labels by apply- 
ing random augmentations as explained below. 

Given an image, random transformations are applied 
to it to generate a pair of two augmented images, 
(x;,x;). Each image in the pair is passed through a 
Convolutional Neural Network (CNN) to compress the 
images into a set of vectors, (hi, hj). Then a non-linear 
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fully connected layer (i.e., projection head) is placed to 
get the representations (z;, zj). The representations are 
learned iteratively by maximizing agreement between 
the augmented views of the same image example (zi, zj) 
and minimizing agreement between all other pairs con- 
sidered as negative. This is achieved via a so-called con- 
trastive loss in the latent space, : 


exp((zi, 2) /h) 
alpen exp((zi, zi) /h)’ 


where (u,v) denotes the dot product between L?- 
normalized u and v, and h denotes the temperature 
parameter that regulates the distribution of the output 
representations (see Hinton et al. 2015; Wu et al. 2018, 
for more details). The final loss is computed in batches 
of size N across all positive pairs, both (i, j) and (7, i), 
while the rest of the augmented examples are treated as 
negative examples, which are denoted by k. 

For this work, we follow the implementation from 
Sarmiento et al. (2021) which was successfully applied 
to astronomical data. The CNN encoder consists of 
four convolutional layers with kernel sizes 5, 3, 3, 3, 
and 128, 256, 512 and 1024 filters per layer, respec- 
tively. Max-pooling layers and Exponential Linear Unit 
(ELU) activation functions are placed after each convo- 
lutional layer. Therefore, the representations before the 
projection head —(hi;, h;)— for each galaxy image are 
encoded into 1024 features. Subsequently, the projec- 
tion head (composed of three fully connected layers of 
512, 128 and 64 neurons per layer) transforms the galaxy 
representations to a latent space —(2;, 2;)— where the 
contrastive loss is computed. 


li, = log 


(1) 


3.2. Data augmentation and network training 


The choice of data augmentations is a key element in 
contrastive learning training (Chen et al. 2020a) as it al- 
lows us to turn the representations independent of some 
nuisance effects. In the context of this work, our goal is 
to obtain a morphological representation that is robust 
to signal-to-noise, rotation, size and does not depend on 
color. To reach this objective, we calibrate our algo- 
rithm on the mock TNG50 dataset, since it allows us 
to access noiseless versions of the images and, therefore, 
marginalize over the noise. More precisely, we apply the 
following augmentations: 


e Noise: as described in section 2, for each galaxy 
image in the F200W and F356W bands we have 
a noiseless version that does not include any in- 
strumental effects or noise. Using the available 
CEERS data, we construct mock CEERS galaxy 
images as a combination of the TNG50 noise- 


less images and random patches of the four ob- 
served CEERS pointings. We first add source 
Poisson noise before convolving the images with a 
PSF extracted from the observations (Finkelstein 
et al. 2022b) in each band. Then, we add real- 
time realistic noise (that may also include other 
sources/interlopers) to each of the 64 x 64 noiseless 
galaxy images by summing up randomly chosen 
patches from the CEERS pointing of the same size. 
From the contrastive learning point of view, these 
images with a real background are considered as an 
augmented copy of their noiseless analogs during 
the training process. These augmentations should 
enforce the representations to be robust to signal- 
to-noise (S/N) as well as to background and fore- 
ground companions. 


Rotation: we apply a random flipping (horizon- 
tal or vertical, but not both) and a random ro- 
tation with 100% chance independently to both 
the TNG50 noiseless image and the patch of the 
sky extracted from the CEERS pointings. This 
augmentation is intended to ensure the model is 
invariant to the galaxy orientation; 


Flur: we randomly apply independent flux scal- 
ing to the noise-added images. The associated flux 
is random but follows the flux distribution of the 
'TNG50 dataset in the F356W band. We randomly 
sample flux values from the TNG50 F356W flux 
distribution and apply them accordingly to noise- 
added images in the two filters. This augmenta- 
tion is intended to stress the robustness to S/N. 
It also helps —as we will show in the following— 
to make the representations independent of galaxy 
size since the regions above the noise scale will vary 
with the flux variations. We note that we decided 
not to apply direct size augmentation, (i.e. such 
as zoom-in or zoom-out), as that would force us to 
up-sample or down-sample the images with less or 
more than 64 x 64 pixels size, respectively, which 
creates some artifacts that the network learns. 


Color: finally, in order to prevent the network 
from learning color information and/or the intrin- 
sic brightness of the galaxies directly from the im- 
ages, we apply two additional augmentations to 
both the noiseless and the noise-added images. 
First, each band is normalized individually after 
the augmentations are applied. Second, the noise- 
less and noise-added images are normalized inde- 
pendently and individually for each galaxy. Conse- 
quently, the maximum pixel value in every galaxy 
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Figure 3. Images of a TNG50 galaxy at z = 3 with 
M. z6x 10°Mo. The first two rows show noiseless and 
augmented versions, respectively, of 10 different projections 
of the selected galaxy in the F200W filter. The third and 
fourth rows correspond to the noiseless and augmented ver- 
sions, respectively, of the same galaxy projections shown in 
the first two rows in the F356W filter. All the panels have a 
64 x 64 pixels size. The pixel values have been asinh trans- 
formed with a 0.5% clipping. 


image is equal to one for each band and counter- 
image. 


In Figure 3, we show several projections of a galaxy 
at z = 3 with M, ~ 6 x 10°Mọ extracted from the 
'TNG50 simulation. In some of the augmented versions 
(along with PSF convolution, realistic noise, random ro- 
tations, flux variation, etc.) it is possible to distinguish 
several companions within the stamps. This is the re- 
sult of adding randomly chosen patches of the CEERS 
pointing to the TNG50 noiseless images. These exam- 
ples come from an extended bright galaxy that appears 
significantly weaker in the augmented images due to the 
flux variations applied in the augmentations and given 
that the TNG50 distribution of F356W peaks at lower 
values (compared to the selected galaxy). 

'To reduce the dynamic range and to be sensitive to 
both the center and outskirts of the galaxy, before train- 
ing, we apply a asinh (inverse hyperbolic sin) trans- 
formation and a minimum-maximum normalization to 
each galaxy pair in each band. Additionally, we ensure 
that only one projection of the same galaxy enters each 
batch during training and that all the galaxy images 
are passed through the network at every epoch. We 
do so to avoid the algorithm learning the orientation of 
the same galaxy as seen from different line-of-sight pro- 
jections since some of the projections are just a simple 
rotation of the galaxy in the sky. 

Our contrastive SimCLR model is trained with the 
mock JWST images for 24760 different projections of 
1238 galaxies within 3 < z < 6 and with stellar 
masses M, > 10? M, in the two observed bands (F200W 
and F356W) with a temperature parameter h = 0.5 
(that controls the strength of penalties on hard negative 
pairs). We randomly split our dataset into a training 


and a test sample consisting of 1100 and 138 galaxies, 
respectively. This translates into a training and a test 
dataset of 22000 and 2760 galaxy images, respectively. 
We train our algorithm with a batch size of N — 550 
(i.e., half the number of galaxies in the training set) for 
1500 epochs in a GPU NVIDIA T4 Tensor Core with 
16 GB of RAM. Random data augmentation is applied 
every 50 epochs to increase the variability during the 
training process. 


3.3. Visualization of the representation space 


We can hence analyze the properties of the representa- 
tion space learned by the SimCLR framework presented 
in the previous section. 

We start by visualizing how the TNG50 galaxy im- 
ages, both with and without noise, are distributed in 
the representation space. Since the representation space 
for each galaxy image is encoded into 1 024 features, we 
perform a dimensionality reduction from the 1 024 fea- 
tures to a 2D space to facilitate the visualization and 
interpretation of this representation. For that purpose, 
we use the Uniform Manifold Approximation and Pro- 
jection (UMAP; McInnes et al. 2018) method with stan- 
dard initial parameters (metric = euclidean, n_neighbors 
= 15 and min dist = 0.1). The UMAP algorithm seeks 
to learn the manifold structure of the input data and find 
a low-dimensional embedding that preserves the essen- 
tial topological structure of that manifold. It is therefore 
a way to visualize in 2D the representations learned by 
the self-supervised network. Before applying the UMAP 
technique, we assume the same distance metric in the 
representation space as the one used to calculate the 
contrastive loss in the head projection space and, there- 
fore, we normalize the representations with an L? norm 
such that the euclidean and cosine distances between 
representations are equivalent. The two coordinate axes 
in the UMAP representation do not have any precise 
physical meaning and are a combination of the 1 024 di- 
mensions extracted by the contrastive learning setting. 

It is important to emphasize that the contrastive ap- 
proach is not intended for dimensionality reduction but 
for obtaining a robust representation of galaxy mor- 
phologies in a different space than images, which ex- 
plains the high dimensionality of the representation 
space. Several works have shown indeed that the per- 
formance of contrastive learning increases with repre- 
sentations of higher dimension (e.g.,Chen et al. 2020a). 
The UMAP projection is used only for visualization pur- 
poses. 

In Figure 4, we show random examples of galaxies in 
the F356W filter (both with and without noise) in the 
UMAP 2D space. Note that some stamps on the right- 
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Figure 4. Randomly chosen images of the simulated TNG50 galaxies in the UMAP visualization. The UMAP space is binned 
and one galaxy image per bin is shown. The left panel shows the noiseless versions of the training galaxies, while the reduced 
versions (TNG50 + random CEERS patch) are shown in the right panel. Note that there is not a one-to-one correspondence 
between the galaxies shown in the two panels. Both panels correspond to galaxy images in the F356W filter. 


hand panel (along with the addition of observed noise) 
show one (or more) foreground/background sources in 
the field of view. The figure clearly shows that galax- 
ies are not randomly organized in the plane, indicating 
that the network has learned some morphological fea- 
tures. The distributions are also similar for galaxies with 
and without noise. Galaxies with extended light distri- 
butions and with clear signs of a disk component —or 
interactions— tend indeed to appear on the upper and 
upper-left parts of the UMAP space, while more com- 
pact galaxies with smoother and concentrated light dis- 
tributions tend to be placed on the bottom-right section 
of the plane. We can also see that galaxies showing more 
elongated shapes are found on the left section of the rep- 
resentation space. Also interesting to notice is how sev- 
eral galaxies with bright companions (off-center sources) 
tend to be placed on the upper-right and bottom-right 
corners of the right-half panel (see subsubsection 3.4.1 
below for a more detailed discussion on this point). 


3.4. A morphological description of galaxies robust to 
noise and background/foreground contaminants 


We now quantify in more detail the differences be- 
tween the representations of noise-added and noiseless 
TNG50 galaxy images, and how the different augmen- 
tations of the same galaxy are represented by our con- 
trastive model. As described in subsection 3.1, one of 
the reasons for using the SimCLR framework is to ob- 
tain a data-driven representation that is robust to noise 
and other observational effects such as foreground and 
background companions. 


3.4.1. Robustness to noise 


1.0 


0.9 


0.8 


0.7 


CDF 
o 
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L— Test (augmented) 
C] Test (noise-only) 


Figure 5. Cumulative distribution function of the distance 
in the UMAP plane between pairs of the same galaxy images, 
denoted as ô. Green histogram corresponds to the distribu- 
tion of ó for the test dataset when only noise is added to the 
image, while the blue histogram shows the distribution of ó 
for the same dataset when all augmentations are applied to 
the noise-added image. 


We first quantify the effect of noise by computing the 
distance in the UMAP representation between the noise- 
less and the noise-added images of each galaxy, denoted 
as ô. For a reference of the UMAP axis ranges, the hor- 
izontal axis (UMAP 1) spans within (—1.9,9.2) and the 
vertical axis (UMAP 2) spans within (—0.9,8.2). The 
total area covered by the data points in the UMAP plane 
is approximately 65 (in the arbitrary UMAP units, see 
Figure 6, for instance). 
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Figure 6. Left-hand panel: UMAP visualization for all the TNG50 galaxy images in our dataset color-coded by the mean 
value of the distance in the UMAP representation between the noiseless and the reduced images of each galaxy (denoted as 6). 
Right-hand panel: UMAP visualization for all the TNG50 galaxy images in our dataset color-coded by the mean value of the 
ratio of the flux measured in the TNG50 stamps and the flux derived from the noiseless TNG50 stamps for the F356W filter 
(denoted as órasew). Large values of órasew indicate the presence of a secondary (or even more) source. The larger dr3s6w is, 
the brighter the companions are with respect to the central galaxy. 


In Figure 5, we show the cumulative distribution func- 
tion of 6 for the galaxy images in the test set for which 
only noise is added to the noiseless images. The rest of 
the augmentations (i.e., flip, rotation, and flux and color 
variation) are not applied. We find that 75% and 90% 
of the projections show values of ô < 1.3 and 6 < 2.1 
in the UMAP space, respectively. Also shown in the 
image is the cumulative distribution of 6 for the same 
galaxy images with all the augmentations applied, as 
described in subsection 3.2. For this dataset, we find 
that 75% and 90% of the projections show values of 
6 S 1.4 and 6 S 2.2 in the UMAP space, respectively. 
When augmentations are applied, the values of 6 < 1 are 
enlarged, but still remarkably well constrained within 
ó S 2 for 90% of the dataset compared to the test set 
with no other augmentations applied besides noise. In 
other words, a value of ô ~ 2 is analogous to saying 
that the noise and noiseless representations of the same 
galaxy pair are located within a circumference of radius 
6/2 = 1. Converted into an area, this means ~ 596 dis- 
placement in the UMAP plane for 9096 of the galaxy 
images (i.e., 6 S 2) in the test set despite the level of 
noise, contamination and augmentations applied to the 
input galaxy images, as can be seen in Figure 3 and 
Figure 4. 

We note also that the distribution of 6 for the train- 
ing and test samples are very similar. Therefore, we 
emphasize the model is not suffering from over-fitting, 
since none of the galaxy images in the test set has been 
shown previously to the network. 


3.4.2. Robustness to background/foreground contaminants 


We then check how our contrastive model behaves 
when more than one source (i.e., companions) is present 
in the stamp. To do so we measure the total flux in the 
noiseless TNG50 stamps (therefore, the intrinsic flux of 
the central galaxy) and in the TNG50 + random CEERS 
patch. The difference between both derived fluxes is a 
proxy for the presence (or not) of companions and, if 
present, how bright they are with respect to the cen- 
tral galaxy. In Figure 6, we show the UMAP plane for 
'TNG50 galaxy images in our dataset color-coded by the 
distance in the UMAP 6 and the ratio of the fluxes in 
the TNG50 stamps and the flux derived from the noise- 
less TNG50 stamps for the F356W filter (denoted as 
Ópasow). It is interesting to note how the main two 
yellow clumps in the UMAP plane where the stamps 
with bright companions (more than three times the flux 
than the flux of the central galaxy, Órasew Z 3) tend to 
concentrate, and also their correlation with large values 
ô = 2. Some of these cases can be seen in the right-hand 
panel in Figure 4. For instance, there are several exam- 
ples within these regions of dp356w Z 3 that correspond 
to TNG50 images for which the companion is so bright 
(such as a star) that the central galaxy cannot be even 
identified in the stamp. In these cases with bright com- 
panions around the central galaxy, the model tends to 
classify the brightest component (thus, the companion) 
instead of the central galaxy. 

To further illustrate the effect of companions and noise 
on the representation space, we show some examples of 
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Figure 7. Randomly chosen examples of galaxy images with 6 > 3 (see Figure 6). For each of the examples, we show noiseless 
and noise-added images in the F200W (top rows) and F356W (bottom rows) filters. Cases 1 to 5 and cases 6 to 10 correspond 
to images with extended and compact bright companions, respectively. Cases 11 to 15 correspond to images with artifacts (such 


as cosmic rays). See subsubsection 3.4.2 for more details. 


the most extreme cases (ô = 3) in Figure 7. The ma- 
jority of images with the largest values of ô are mainly 
due to the presence of bright companions in the galaxy 
images (or artifacts). On one hand, in the noise-added 
images of cases 1 to 5 (belonging to the yellow clump 
with UMAP 2 => 4 in the right-hand panel of Figure 6) 
it is possible to identify extended bright companions. 
On the other hand, in the noise-added images of cases 
5 to 10 (belonging to the yellow clump with UMAP 1 
= 6 in the right-hand panel of Figure 6) it is possible to 
identify compact bright companions, mainly stars (cases 
6,7 and 10). Finally, we also show 5 examples of galax- 
les (cases 10 to 15) belonging to the small yellow clump 
with UMAP 2 > 7 in the left-hand panel of Figure 6. 
In the F200W noise-added images of these cases, it is 
possible to identify some burned pixels and saturated 
black patches that are due to cosmic rays or artifacts 
originated during the reduction process of the CEERS 
pointings used to add noise. Therefore, finding galaxy 
images that are located in that region of the UMAP is 
indicative of some quality image problems or contami- 
nation by other sources in the images. 

Removing the cases with ô = 3 will even reduce the 
differences between the representations of the noise-only 
and the augmented datasets (Figure 5). Nevertheless, 
there is a significant fraction of TNG50 galaxy images 
with close companions that are still represented accord- 
ing to just the central galaxy in the stamps since they 
have values of ó < 2. 

Therefore, training our contrastive model with a com- 
bination of noiseless and noise-added TNG50 images 
leads to a robust representation of TNG50 images even 
in the case of the presence of companions in the image 
(at least, for those cases in which the companion is not 
extremely bright compared to the central galaxy). For 
the cases in which the companion is much brighter than 
the central galaxy, their locations in the UMAP may cer- 


tainly help to find these cases in observed images and to 
treat them carefully in subsequent analysis. 


3.5. Dependence on physical and photometric 
parameters 


An advantage of calibrating the neural network model 
with simulations is that we have access to a large num- 
ber of physical properties of the galaxies. An additional 
test for our classification scheme is, therefore, to ex- 
amine how the representation space is correlated with 
physical quantities as well as with other (more standard) 
morphological measurements. 

'To increase the number of galaxies and given the con- 
siderations described in subsubsection 3.4.1, hereafter, 
we show representations and galaxy images for the whole 
dataset and remove those cases with 6 > 3.0 (corre- 
sponding to ~ 10% of the dataset). By doing so we 
retrieve a more reliable representation of the galaxies in 
our dataset without the impact of bright companions 
and artifacts. 


3.5.1. Correlation with physical properties 


In this section, we discuss how some physical proper- 
ties extracted from the TNG50 simulation correlate with 
the representation in the UMAP plane. In Figure 8, 
we show the dependence in the UMAP plane with the 
total stellar mass (M.[Mo]), the specific angular mo- 
mentum of stars (j.[kpc km s^1]), the mass fraction in 
non-rotating stars (fnr) and the flatness (1 — f) of the 
galaxy. The mass fraction in stars that have no net 
angular momentum around the z-axis is defined using 
the circularity parameter e — J;/J(E), as in Marinacci 
et al. (2014), for every star particle. It measures the 
maximum specific angular momentum possible at the 
specific binding energy E of the star. The mass frac- 
tion in non-rotating stars mass (denoted as fpr) is then 
defined as the fractional mass of stars with e < 0 mul- 
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Figure 8. UMAP visualization for all the TNG50 galaxy images in our dataset color-coded by the distribution of several 
physical properties extracted from the TNG50 simulation. Color code corresponds to the median values in each hexagonal 
bin in the UMAP plane. From left to right and top to bottom, the different panels show: the logarithm of the total stellar 
mass (log M.[Mc]), the logarithm of the specific angular momentum of the stars (log j.[kpc km s !]), the mass fraction in 
non-rotating stars (fnr) and the galaxy flatness (1 — f). The scatter maps of these parameters are presented in Appendix A. 


tiplied by two. The flatness of the galaxy is computed 
as follows: f = c/v/ba, where c < b < a denote the 
principal axes obtained as the eigenvalues of the mass 
tensor of the stellar mass inside 2r,. The larger 1 — f 
is, the flatter the system is in 3D. Here and throughout 
the paper we refer to the definitions and measurements 
of Pillepich et al. (2019). See also subsection 5.2 for a 
more detailed discussion of the 3D shapes of the TNG50 
galaxies. 

Figure 8 shows remarkable correlations between the 
position of galaxies in the UMAP and their average 
physical properties. Overall, galaxies with larger spe- 
cific angular momentum and a flatter stellar distribution 
tend to populate the upper left region of the UMAP. 
These galaxies are also the most massive ones although 


the correlation is less clear. The bottom right corner is 
populated by rounder objects with lower specific angular 
momentum. It is also interesting to see that the tran- 
sition between the variation of the physical properties 
is smooth, translating a continuum of galaxy morphol- 
ogy structure. 

Figure 8 only shows the median values of the physical 
properties in different regions of the UMAP. In order 
to quantify how constraining are these correlations, it 
is also important to measure the scatter of the differ- 
ent properties. This is shown in the appendix A ( Fig- 
ure A1). In most cases, the scatter represents less than 
~ 2096 of the dynamical range, indicating that the distri- 
butions are overall relatively narrow and, therefore, the 
correlations with physical properties are informative. 
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We conclude that the representation space for images 
—in addition to being robust to observational and in- 
strumental effects— carries information about the kine- 
matics and intrinsic shapes of galaxies. 


3.5.2. Connection to standard morphological measurements 


In Figure 9, we show the dependence on several photo- 
metric parameters estimated by Costantin et al. (2022b) 
in the F200W filter: the effective radius (re), the Sérsic 
index (ne), the ellipticity from the Sérsic fit (1—b/a), the 
concentration parameter (C), the asymmetry (|A|) and 
the smoothness (S). There is a remarkable correlation 
between the position in the UMAP plane and ne, C, A 
and S. Gradually, ne C and S grow from left to right in 
the UMAP space, while the A does it from right to left. 
'Therefore, galaxy images with smoother, symmetric and 
concentrated light distributions are found towards the 
right section of the UMAP plane. Also important is the 
correlation with the ellipticity (1 —5/a), with more elon- 
gated galaxies lying on the left (bottom-left) section of 
the UMAP plane. To illustrate again the spread of these 
representations, we show in the appendix A (Figure A2) 
the scatter of the parameters shown in Figure 9 as the 
standard deviation divided by the mean values in each 
hexagonal bin in the UMAP plane. 

It is important to notice the existing correlation with 
the physical effective radius, re, with the largest galax- 
ies populating the top-left section of the UMAP plane. 
'This correlation with the physical size is not in contra- 
diction with the representations being independent of 
apparent galaxy sizes but reflects instead the known cor- 
relation between morphological appearance and physical 
size (e.g.,van der Wel et al. 2014a). 

Based on the previous maps calibrated with the 
'TNG50 simulation, asymmetric, more extended, flatter 
and rotationally-supported galaxies tend to populate the 
upper-left section of the UMAP representation. In more 
detail, the more to the left in the UMAP plane a galaxy 
is, the more elongated it appears. Smoother, more com- 
pact, rounder and non-rotating galaxies are located to- 
ward the bottom-right corner of the UMAP representa- 
tion. Although not shown, the results presented here are 
consistent (despite small variations) for the same mor- 
phological parameters measured in the F365W filter. 


4. SELF-SUPERVISED LEARNING 
REPRESENTATION OF JWST GALAXY 
IMAGES 


In this section, we apply the methodology described 
before to the two datasets of observed galaxies with 
JWST described in subsection 2.1. 


4.1. Representations of CEERS galaxy images 


We feed the 891 observed CEERS galaxies to our con- 
trastive model to retrieve their corresponding represen- 
tations in the 1024 dimensions space. Then, we normal- 
ize the derived features and transform them into a 2D 
vector using the same UMAP embedding obtained for 
the features of the TNG50 galaxy images. 

In the top row of Figure 10, we show a comparison of 
the UMAP representation space for our initial (TNG50 
+ random CEERS patch) and the observed CEERS 
datasets. Interestingly, observed galaxies tend to pop- 
ulate the complete UMAP plane which indicates that 
both samples share similar morphological diversity. The 
UMAP visualization is however a projection of a higher 
dimension space, which is not appropriate for outlier de- 
tection. Even if observed galaxies would not reside in 
the same manifold as simulated objects, the UMAP rep- 
resentation would tend to show them towards the edges 
of the plane but not outside. This is the behavior seen 
for observed CEERS galaxies which tend to be concen- 
trated in the edges of the UMAP cloud —towards the 
bottom and bottom-right sections— independently of 
the source redshift. Given that the mass and flux distri- 
butions of both datasets are consistent —even though 
we have not performed a careful one-to-one match be- 
tween simulations and observations— the differences in 
the distributions of points of both datasets are likely 
to originate in intrinsic differences in the morphologi- 
cal properties. Combining the distributions of points 
in Figure 10 with the information provided by Figure 8 
and Figure 9, we conclude that observed CEERS galax- 
ies occupy more frequently than the simulated TNG50 
galaxies the regions in the representation space where 
galaxies are more compact and with less specific angu- 
lar momentum. We investigate these differences in more 
detail in section 5. 


4.2. Representations of VISUAL galaxy images 


We also present a comparison of the representation 
obtained after applying our contrastive model to the 
VISUAL dataset for which visual morphological classifi- 
cations are provided (Kartaltepe et al. 2022). After se- 
lecting those galaxies with M, > 10°Mo, 3 < z < 6 and 
reliable visual classifications we end up with a dataset 
of 523 galaxies. From this dataset, there are 307 candi- 
dates (~ 59%) classified as disks in four categories: 117 
Disk, 103 Disk+Irr, 27 Disk+Sph+Irr and 60 Disk+Sph. 
On the other hand, 202 galaxies (~ 38%) are classified 
as spheroids in three categories: 92 Sph, 23 Sph+Irr and 
, 27 Disk+Sph+Irr and 60 Disk+Sph. Finally, there are 
82 galaxies classified as Irr (~ 16%), 4 as point sources 
and 2 as unclassifiable. 
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Figure 9. UMAP visualization for all the TNG50 galaxy images in our dataset color-coded by the distribution of several 
morphological and photometric parameters. Color code corresponds to the median values in each hexagonal bin in the UMAP 
plane. From left to right and top to bottom, the different panels show: the logarithm of the effective radius (re[kpc]), in kpc), 
the Sérsic index (ne), the ellipticity based on Sérsic fit (1 — b/a), the concentration (C), the asymmetry (A) and the smoothness 
(C). The scatter maps of these parameters are presented in Appendix A. 


In the bottom row of Figure 10, we show the rep- 
resentation of the galaxy in the VISUAL dataset in 
the UMAP plane for the various morphological groups 
based on the provided visual classifications. Although 
the mass and redshift selection of the galaxies is based on 
different estimators (JWST photometry for the CEERS 
dataset and CANDELS photometry for the VISUAL 
dataset), the distribution of the representations in the 
UMAP plane for the VISUAL galaxies is similar to the 
CEERS representations (i.e., a significant fraction of 
galaxies occupy the bottom section of the UMAP plane). 

The figure reveals some expected correlations with 
the traditional visual morphology. It is reassuring that 
Disk+Spheroid and Spheroid groups from the VISUAL 
catalog populate the bottom-right section of the UMAP 
plane, where compact, non-rotating galaxies with low 
angular momentum (according to TNG50 properties) 
are expected to be. However, we notice that galaxies 
classified as Disk, Irregular and/or Disk+Irregular in 
VISUAL are distributed throughout the plane even to- 
wards the bottom-right section of the UMAP very close 
to where spheroids lie. As shown in Figure 8 and Fig- 
ure 9, this lower right region of the UMAP where, ac- 
cording to VISUAL, disk-like morphologies are located 
corresponds to galaxies in TNG50 with physical and 


photometric properties typically shared by spheroidal 
systems, such as low specific angular momentum, large 
mass fractions in a non-rotating component, low flatness 
and large Sérsic indexes. This rises interesting questions 
about the true nature of these disks that we discuss in 
section 5. 


5. IMPLICATIONS AND DISCUSSION: THE 
OPTICAL REST-FRAME MORPHOLOGIES OF 
HIGH REDSHIFT GALAXIES 


In this section, we examine in more detail the differ- 
ences found in previous sections between the simulated 
'TNG50 and the observed JWST galaxy images. We also 
discuss the implications of our results for the nature of 
visually identified disks at z > 3 in the real Universe. 


5.1. Does the TNG50 model reproduce the 
morphologies of observed galaxies? 


5.1.1. Distribution of representations 


'The representations of the simulated TNG50 and the 
observed CEERS galaxy images inferred by our con- 
trastive model seem to be distributed differently (sub- 
section 4.1 and subsection 4.2). Observed CEERS galax- 
ies tend to concentrate in the bottom section of the 
UMAP plane, while simulated TNG50 galaxies expand 
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Figure 10. Comparison of distributions of observed and simulated galaxies in the representation space. The top row shows the 
CEERS mass-complete sample and the bottom row the VISUAL sample with available visual morphologies. Top left-hand panel: 
UMAP visualization for the observed CEERS galaxy images selected in mass and redshift (color-coded by the source photo-z) 
overlapped with the representation of noise-added TNG50 galaxy images. Top middle panel: randomly chosen observed CEERS 
galaxy images in the UMAP visualization in the F200W filter. Top right-hand panel: randomly chosen observed CEERS galaxy 
images in the UMAP visualization in the F356W filter. Bottom left-hand panel: UMAP visualization for the observed VISUAL 
galaxy images selected in mass and redshift. Points are colored according to the visual classifications into several non-exclusive 
classes. Bottom middle panel: randomly chosen observed VISUAL galaxy images in the UMAP visualization in the F200W 
filter. Bottom right-hand panel: randomly chosen observed VISUAL galaxy images in the UMAP visualization in the F356W 


filter. 


over the whole UMAP range with similar number den- 
sities. As previously mentioned, the UMAP represen- 
tation is not well suited for the detection of outliers. 
Therefore, even if observed galaxies seem to overall lie 
in the same region as simulated ones, they can still live 
in different manifolds in the higher dimensionality rep- 
resentation space. 

'To further quantify this distribution shift, we first de- 
rive the distance - in the 1 024 dimensionality space - to 
the 10th closest T'NG50 neighbor for each galaxy in the 
VISUAL and CEERS datasets (619). In order to have a 
fair reference distribution, we select, for each observed 
galaxy, the closest simulated one in the representation 
space and compute also 619. If both datasets - observed 
and simulated - live in the same manifold the distribu- 
tion of distances should be similar. If on the contrary, 
observed galaxies populate different regions of the pa- 
rameter space, their representations should be more iso- 
lated and therefore we should measure larger values of 
619. The distribution of distances ó19 is shown in Fig- 
ure 11. We clearly see that the distributions for observed 
galaxies are shifted towards larger values compared to 
the reference distribution. This indicates that observed 


galaxies are on average more isolated than simulated 
ones in the representation space. This separation could 
be interpreted as an additional indication that the rep- 
resentations obtained for the TNG50 and the JWST ob- 
servations do not exactly live in the same manifold. 

To better understand these measured discrepancies, 
we quantify the differences between observed and simu- 
lated galaxies in terms of more standard morphological 
properties in Figure 12. We show the distributions of 
observed and simulated galaxies in the log M, — log re, 
log M. — log ne and log M. — b/a planes in four redshift 
bins. To divide in redshift, we take all galaxies in a given 
snapshot for the simulation and observations are associ- 
ated with the closest snapshot based on the photometric 
redshift. It should be kept in mind that this figure (as 
the previous ones) do not include all TNG50 galaxies, 
but those for which the JWST mocks are available for a 
field-of-view larger than 64 x 64 pixels (see subsection 2.2 
and Figure 2 for more details). Given the small amount 
of galaxies removed, we do not expect the distributions 
to change significantly though. 

It is manifest, firstly, that the TNG50 simulated galax- 
ies overlap with CEERS observed ones in the parame- 
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Figure 11. Probability density functions of the distances 
to the 10th closest neighbor in the 1024 dimensions of the 
representation space, denoted as 619. Solid histograms corre- 
spond to the distance to the 10th closest neighbor in TNG50 
of the closest TNG50 neighbor of each galaxy in the VI- 
SUAL (in blue) and the CEERS (in green) datasets. Dashed 
histograms correspond to the distance to the 10th closest 
neighbor in TNG50 of each galaxy in the VISUAL (in blue) 
and the CEERS (in green) datasets. 


ter spaces of Figure 12. This is per se, again, a non- 
negligible confirmation of the zeroth-order good func- 
tioning of the underlying model. However, it is also 
apparent, differently than what could be deduced from 
the representation space distributions, that the TNG50 
galaxies studied here actually exhibit less galaxy-to- 
galaxy variation in sizes, Sersic indices and shapes than 
CEERS observed galaxies, at fixed stellar mass and 
redshift. Furthermore, the TNG50 simulation predicts 
galaxies with larger sizes (at z = 3—4, but not z = 5—6), 
with smaller values of ne (at all z = 3 — 6) and that are 
rounder in projection (more so the higher the redshift) 
than what is measured in CEERS. These differences at 
least partly explain the different distributions in the rep- 
resentation space of contrastive learning and also go in 
the expected direction of observed galaxies mainly pop- 
ulating the bottom right corner of the UMAP. 

These reported differences could originate from a 
resolution-induced effect (see e.g., Zanisi et al. 2021) 
or could be an indication of more fundamental physi- 
cal differences. We note as well that the stellar masses 
reported for the TNG50 simulation correspond to the 
3D stellar mass, while those obtained for the CEERS 
dataset are based on the SED fitting to the JWST pho- 
tometry. Also, the Sersic parameters for the TNG50 and 
the CEERS galaxies are derived using different method- 
ologies: for the T'NG50, the morphological parameters 
are obtained with statmorph (as described in Costantin 


et al. 2022b); while for the CEERS dataset, they are de- 
rived with galfit. More in-depth comparisons of sim- 
ulated and observed data—likely beyond images—are 
required to reach a final conclusion. 


5.1.2. Morphological classes 


As an additional way to quantify the differences be- 
tween TNG50 galaxies and observed CEERS galaxies, 
we compare the abundances of TNG50 and CEERS 
galaxies retrieved from the separation into two main 
classes (for simplicity) using a clustering technique. In 
particular, we apply the k—means algorithm to clus- 
ter data in the representation space by trying to sep- 
arate samples in k groups of equal variance, minimizing 
a criterion known as the inertia or within-cluster sum- 
of-squares (WCSS). 

In Figure 13, we show the UMAP visualization color- 
coded by the two classes for the simulated TNG50 and 
the observed CEERS datasets. Note that galaxies with 
artifacts or bright companions are not included in the 
derivation of the different class fractions hereafter, al- 
though they are shown in the UMAP visualization panel. 
The clustering algorithm naturally separates the upper 
and lower regions of the UMAP. Given the division, we 
denote those galaxies located in the upper-left section of 
the UMAP as eztended (E) galaxies, while those located 
in the bottom-right section as compact (C) galaxies. We 
find that ~ 5296 and ~ 48% of the galaxies belong to the 
E and C class, respectively. To reinforce the conclusion 
that the model is robust to noise, we find a 92% agree- 
ment between the two classes for the noiseless and the 
noise-added TNG50 dataset. This can be interpreted 
as classes being consistent independently of the galaxy 
image shown to the contrastive model, i.e., the noiseless 
or the noise-added version. 

The fraction of galaxies belonging to the compact (C) 
class as a function of the stellar mass for both observed 
and simulated galaxies confirms that the TNG50 model 
systematically under-predicts the abundance of compact 
galaxies. 


5.2. Are visually classified disks really disky? 


We now combine the visual classifications with the 
positions of galaxies in the representation space of con- 
trastive learning—which have been shown to correlate 
with physical properties—to try to establish new con- 
straints on the abundance of disks at z > 3. As shown 
in section 4.1, visually classified disks are spread all over 
the representation space which suggests that disks rep- 
resent a heterogeneous group of galaxies with different 
physical properties. 


5.2.1. 3D shapes from the representation space 
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Figure 12. Distribution of the logarithm of the effective radius (logre in kpc, top row), the Sérsic index (ne, middle row), and 
the axis ratio (b/a, bottom row) as a function of the logarithm of the stellar mass (log M.) for the TNG50 (in blue) and the 
CEERS (in green) datasets. From left to right, the panels show the distributions at z — 3,4,5,6 for the TNG50 dataset. For 
the CEERS dataset, galaxies are included in the closest redshift value. Contour levels enclose 2596, 5096 and 7596 of the data. 
'The photometric parameters shown are measured in the F200W filter. 
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Figure 13. Left-hand panel: UMAP visualization of noise-added TNG50 galaxy images color-coded by classes according to 
the k-means method for two clusters: Extended (E) class (in blue) and Compact (C) class (in red). Middle panel: same as 
left-hand panel but for the CEERS galaxy images. Right-hand panel: fractions Compact (C) galaxies in TNG50 (solid lines) 
and CEERS (dashed lines) in 5 logarithmic mass bins of width 0.5 dex in the range 9 <= log(M/Mg) <= 11.5. The shaded 
regions correspond to the fraction errors considering Poisson errors in the number of selected galaxies and the total number of 
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We start by exploring in more detail how the con- 
trastive learning representation space distributes galax- 


ies with different 3D structures of the stellar mass dis- 
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Figure 14. Minor-to-major (s) versus intermediate-to- 
major (q) axis ratios of the stellar mass distribution for the 
TNG50 galaxies analyzed in this work. Galaxy shapes are 
split into Oblate (in blue), Spheroid (in orange) and Pro- 
late (in red) according to the definition of van der Wel et al. 
(2014b) and the color code used in Pillepich et al. (2019). 


tribution. We characterize the shape of galaxies in 
the TNG50 sample as done and studied in Pillepich 
et al. (2019), i.e., with an ellipsoid with three semi-axes 
c « b < a and use the axial ratios q = b/a (intermediate- 
to-major) and s — c/a (minor-to major) to define three 
main 3D shape classes: oblate, spheroid and prolate fol- 
lowing the definitions of van der Wel et al. (2014b) and 
Zhang et al. (2019). The axial ratios are derived after 
diagonalizing the stellar mass tensor in an iterative way 
while keeping the major axis length fixed to 2r,. We 
consider oblate or disky galaxies those with a ~ b > c, 
elongated or prolate objects those with a > b — c and, 
finally, spheroidal systems those with similar values for 
the three semi-axes. Figure 14 shows the distribution 
of the TNG50 sample used in this work (z > 3 and 
log M./Mo > 9) in the s — q plane used by van der Wel 
et al. (2014b) to define the three main 3D shapes. Note 
that, by definition, the 20 projections of the TNG50 
galaxies have the same 3D shape. At these redshifts, we 
find that — 6796 of the galaxies in the simulation have 
a spheroidal shape (828 out of 1238) according to this 
definition. Only 217 (17%) and 193 (16%) have oblate 
and prolate shapes, respectively. The fact that at high 
redshift and low stellar masses, galaxies tend to present 
a more prolate structure has been found both in obser- 
vations (van der Wel et al. 2014b; Zhang et al. 2019) 
and simulations (Tomassetti et al. 2016; Pillepich et al. 
2019). 

We explore in Figure 15 the distribution of the mock 
images of prolate, oblate and spheroid galaxies in the 


UMAP projection of the representation space obtained 
with contrastive learning. We consider for this exercise 
the 20 projections of the same galaxy as independent ob- 
jects which explains the larger number of objects com- 
pared to Figure 14. Interestingly, disky galaxies tend 
to be located roughly in the upper-left half section of 
the UMAP manifold, while elongated systems are dis- 
tributed all over the UMAP plane. Although spheroidal 
galaxies populate all the plane because they vastly dom- 
inate in numbers, the results of Figure 15 suggest that 
galaxies located in the bottom-right corner of the UMAP 
(i.e., outside the 95% contour level) are very unlikely to 
be disky even if they appear as elongated in the images. 
However, Figure 10 shows that visually classified disks 
are distributed all over the place and that only a small 
fraction is located in the region where oblate systems 
are expected to be. It suggests again that not all the 
visually classified disks have the same properties. 


5.2.2. Oblate systems 


Based on the previous result, we select galaxies with a 
disk visual morphology and a high probability of being 
oblate in shape to further study their properties. These 
systems should be considered as true flat disk candi- 
dates. By disk visual morphology we denote all galaxies 
belonging to one of the following classes: Disk, Disk + 
Irr, Disk + Sph + Irr and Disk + Sph. We then de- 
fine as oblate disk candidates those galaxies with a disk 
visual classification which are located within the 9596 
contour for the oblate systems in the TNG50 dataset 
(see Figure 15). We note that only one galaxy classi- 
fied as Sph and two classified as Sph + Irr fall within 
the oblate contour defined that way. We find 128 oblate 
disk candidates out of 307 galaxies visually classified as 
disks, which corresponds to ~ 42% of the galaxies with 
a disk visual morphology. In detail, from the 128 candi- 
dates for oblate disk: 50 galaxies are visually classified 
as Disk, 8 galaxies are classified as Disk + Sph, 61 galax- 
ies are classified as Disk + Irr and 9 galaxies as Disk + 
Sph + Irr. In Figure B1, we show the galaxy images 
of the oblate disk candidates. They all present extended 
light distributions with, in many cases, signs of internal 
structure or irregularities. 


5.2.3. Non-oblate systems 


Following a similar approach, we now turn our atten- 
tion towards the objects that, according to our repre- 
sentation, are very unlikely to be true oblate disks, but 
are visually classified as such. This should allow a bet- 
ter understanding of the morphological properties that 
drive the visual classification into disks. We focus on 
those galaxies visually classified as disks (including all 
the different disk categories) that are located outside the 
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Figure 15. Location in the UMAP plane of TNG50 oblate (blue points in the left-hand panel), spheroid (orange points in 
the middle panel) and prolate (red points in the right-hand panel) galaxies according to the 3D shape inferred from the stellar 
particles. Morphologically-disky (i.e., oblate) galaxies tend to populate the upper regions of the UMAP plane, while prolate 
and spheroid systems cover the whole UMAP space. The contour levels indicate the 2596, 5096 and 9596 probabilities 


9596 probability contour for oblate systems. Hereafter, 
we denote these galaxies as non-oblate disk galaxies. We 
find that ~ 58% (179 out of the 307) of the galaxies with 
disk visual morphologies fulfill this selection criterion. In 
particular, out of these 179 non-oblate disk candidates, 
we find that 67 are classified as Disk, 52 are classified 
as Disk + Sph, 42 are classified as Disk + Irr and 18 
as Disk + Sph + Irr. On the other hand, among the 
non-oblate systems, 91 galaxies are classified as Sph, 21 
are classified as Sph + Irr and 30 are classified as Irr. 
In Figure B2, we show some examples of non-oblate 
galaxies that are visually classified as Disk, Disk+Irr 
and Irr, along with some examples of Sph for compari- 
son. Most of the disk galaxies present a more elongated 
shape in the 2D image compared to the spheroid galax- 
ies which can be interpreted as a disk signature. This 
elongation is most likely driving the visual classification 
toward disk-like morphologies. In fact, the example im- 
ages of galaxies lying in a similar region of the UMAP 
but visually classified as Sph are, on average, rounder 
than the ones visually classified as Disk or Disk--Irr. 


5.2.4. Two populations of visually classified disks 


'The results from the previous subsections suggest that 
there are two different populations of visually classified 
disks which we call oblate disk and non-oblate disk can- 
didates. This result relies somehow on the calibration 
of the representation space with the T'NG50 simulation 
which, as shown in subsection 5.1, might introduce some 
unknown biases because the morphologies of observed 
and simulated galaxies do not perfectly match. Nev- 
ertheless, it shows that prolate or spheroidal systems 
might appear elongated in noise-added 2D images and, 
therefore, be misidentified as disks. This effect should 


be independent of the degree of agreement between sim- 
ulated and observed galaxies. 

We now examine the properties of the two populations 
using standard projected shape measurements in a sim- 
ilar way as done in Zhang et al. (2019). In Figure 16, 
we show the distribution in the axis-ratio/semi-major 
axis plane (b/a — loga) of oblate disk and non-oblate 
disk candidates (along with non-oblate spheroids galax- 
ies for comparison) according to the contrastive learn- 
ing approach. We denote by non-oblate spheroid galax- 
ies those objects visually classified as Sph and located 
outside the 9596 probability contour for oblate systems. 
The b/a and a values in Figure 16 are derived from Ser- 
sic fits to the F356W light profiles (see section 2 for more 
details). 

To make this comparison, we remove from the oblate 
class some objects which cluster in the UMAP plane 
at 0.5 S UMAP 1 < 2 and —0.5 S UMAP 2 <1 in 
Figure 10. Indeed, after a visual inspection, many of 
them show hints of a double-nucleus, interacting galax- 
ies or galaxies with a close companion comparable in 
brightness to the central galaxy, which are therefore 
very unlikely to be disks. It is interesting though that 
these systems are located in a very specific region of the 
representation space. In Figure B3 and Figure B4 in 
Appendix B, we show 20 randomly chosen examples of 
these galaxies. They also look elongated in projection. 

The Figure 10 shows that oblate disk candidates 
present a relatively flat distribution of b/a, as one would 
expect from a pure projection effect. They also have 
larger semi-major axes than the average population 
(peaking at loga ~ 0.2 — 0.3), as also expected for disk 
galaxies. However, the non-oblate disk candidates are 
more concentrated towards smaller values of b/a < 0.6 
and smaller semi-major axis (peaking at loga ~ 0.0). 
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Figure 16. The projected b/a—log a distribution of CEERS galaxies observed with JWST and with available visual classification 
for: oblate disk candidates (left-hand panel), non-oblate disk candidates (middle panel) and non-oblate spheroid candidates 
(right-hand panel). All quantities are measured in the F356W filter. Note that cases falling in the region where double-nucleus 
and/or interacting galaxy are not included in the oblate disk candidates. 
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Figure 17. Probability density functions of Sèrsic index 
measured in the F356W filter for oblate disk (solid blue his- 
togram), non-oblate disk (dashed red histogram) and non- 
oblate spheroid candidates (shaded red histogram) in the VI- 
SUAL dataset of JWST observed galaxies. 


The lack of round systems in this population is also an 
indication that their elongated projected shapes are not 
only a projection effect but it is more a consequence 
of their intrinsic shape. This confirms the impression 
that these galaxies are visually classified as Disk and Irr 
mostly due to their elongated projected shapes, even if 
their intrinsic shapes are more likely to be prolate or 
spheroidal. For comparison, spheroids do clearly popu- 
late the region of high b/a values, 

In Figure 17, we finally show the distribution of ne 
for oblate disk, non-oblate disk and non-oblate spheroid 
candidates. It is clear how the distribution of n, for the 
oblate disk candidates peaks at ne S; 1, characteristic of 


^ 


an exponential profile. Contrarily, the ne distributions 


for non-oblate disk and non-oblate spheroid candidates 
are more skewed towards larger values of the Sérsic in- 
dex. This result is again indicative of the different prop- 
erties of galaxies Disk visual morphologies. 

Our results are also in qualitative agreement with 
the predictions of zoom-in cosmological simulations with 
fewer systems but somewhat better resolution and dif- 
ferent underlying galaxy formation models (e.g. Tomas- 
setti et al. 2016) which have measured a mass-dependent 
transition between prolate and oblate stellar shapes at 
high redshift. A more in-depth comparison of the phys- 
ical properties of these populations and the predictions 
from different galaxy formation models is out of the 
scope of this paper, but should definitely provide ad- 
ditional clues. 


6. SUMMARY AND CONCLUSIONS 


'This work presents a novel data-driven method based 
on contrastive learning to infer the morphological prop- 
erties of galaxies observed with JWST. The method is 
calibrated on mock JWST galaxy images extracted from 
the TNG50 cosmological simulation that, thanks to its 
large number of qualitatively-realistic galaxies, allows us 
to produce a morphological description —without any 
assumption on galaxy classes— robust to noise, galaxy 
size, color and orientation. In addition to the robustness 
to noise, we show that the obtained representations of 
galaxies based on their images correlate well with some 
other physical properties inferred from the simulation 
(such as the specific angular momentum of stars, jx, and 
the intrinsic 3D shape) along with some measured pho- 
tometric and structural properties (such as Sersic index 
and ellipticity). 

We have applied the method to JWST images from 
the CEERS survey in the F200W and F356W bands of: 
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1) a mass-complete sample (M, > 10°Mo) of galaxies 
at 3 < z < 6 in the CEERS survey; and 2) a mass- and 
a redshift-selected sample of CEERS galaxies at 3 « 
z < 6 with M, > 10°Mo for which visual morphological 
classifications are available. 

Our main results are: 


e Simulated galaxies from the TNG50 cosmological 
simulation seem to cover well the observed mor- 
phological diversity at z > 3. However, the mor- 
phological distributions of CEERS and simulated 
galaxies are measured to be different. When com- 
pared at the pixel level, simulated and observed 
galaxies seem to populate in different proportions 
the different regions of the TNG50-trained man- 
ifolds. We show that these differences can be at 
least partly explained because observed galaxies 
can be more compact and more elongated than 
simulated ones. In fact, the galaxy-to-galaxy 
variation in sizes, Sersic indices and shapes at 
fixed stellar mass and redshift is larger in the ob- 
served CEERS population than in TNG50 simu- 
lated ones. 


Our morphological description also suggests that 
visually classified disks comprise two different pop- 
ulations: one made of írue flat disks and an- 
other more compatible with having a prolate or 
spheroidal shape. A significant fraction of galax- 
ies (~ 58%) that are visually classified as disks 
are indeed located in the representation space very 
close to compact spheroids and therefore are more 
consistent with having a prolate or spheroidal stel- 
lar structure. Although some of these conclusions 
are affected by the calibration with the TNG50 
model, our study robustly confirms that some ob- 
jects with a prolate or spheroidal intrinsic shape 
are elongated in the images and can be misclas- 
sified as disks. The coexistence of prolate and 


oblate systems at high redshift is in qualitative 
agreement with the predictions of other models 
(e.g. zoom-in simulations) which also found that 
low-mass galaxies at high-z tend to present a pro- 
late shape (Ceverino et al. 2015; Tomassetti et al. 
2016). More in-depth follow-up of these two pop- 
ulations of galaxies, possibly with spectroscopy, is 
required to further their true nature. 
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Figure Al. UMAP visualization for all the TNG50 galaxy images in our dataset color-coded by the distribution of several 
physical properties extracted from the TNG50 simulation. From left to right and top to bottom, the different panels show the 
UMAP plane color-coded by the NMAD of: the logarithm of the total stellar mass (log M,.[Mo]), the logarithm of the specific 
angular momentum of the stars (log j.[kpc km s^ !]), the mass fraction in non-rotating stars (far) and the galaxy flatness (1— f). 


APPENDIX 


A. SCATTER OF PHYSICAL AND PHOTOMETRIC PARAMETERS IN THE UMAP VISUALIZATION 


In Figure A1 and Figure A2, we show the variability in the physical and photometric parameters, respectively, 
in the UMAP visualization shown in Figure 9 in subsection 3.5. The scatter is quantified as a normalized median 
absolute deviation, denoted here as NMAD. The median absolute deviation (MAD), defined as MAD(y) = median(|y — 
median(y)|), is a robust measure of the variability of a univariate sample of quantitative data. The MAD is less affected 
by outliers and non-gaussianity than the typical variance and standard deviation. To facilitate the comparison between 
different variables, we normalized the MAD by the dynamical range of the data, defined as the percentile range 
containing 98% of the data. The resulting NMAD is an indicator of the variability of the data that, in this case, 
shows how informative the correlation with the different parameters shown in Figure 9 are. Values of NMAD < 0.2 
are indicative of a low variability of the data. 
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Figure A2. UMAP visualization for all the TNG50 galaxy images in our dataset color-coded by the distribution of several 
morphological and photometric parameters. From left to right and top to bottom, the different panels show the UMAP plane 
color-coded by the NMAD of: the logarithm of the effective radius (re[kpc]), the Sérsic index (ne), the ellipticity based on Sérsic 
fit (1 — b/a), the concentration (C), the asymmetry (A) and the smoothness (S). 


Figure B1. Examples of 20 galaxy images considered as oblate disks candidates in the VISUAL dataset. Images are framed 
in color according to the visual classification (blue for Disk). Each galaxy image is shown in the two F200W (top row) and the 


F356W (bottom row) filters. See subsubsection 5.2.2 for more details. 


B. EXAMPLES OF OBSERVED JWST GALAXY IMAGES 


In Figure B1, Figure B2, Figure B3 and Figure B4, we show some examples of galaxies in the CEERS and the 
VISUAL datasets according to different selection criteria. 
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Figure B2. Examples of VISUAL galaxy images denoted as non-oblate galaxies. Different rows show examples of the various 
visual morphologies according to the VISUAL project, as follows: images framed in blue show examples of galaxies classified 
as Disks; images framed in cyan show examples of galaxies classified as Disks+IJrr; images framed in purple show examples of 
galaxies classified as Irr; images framed in red show examples of Sph galaxies. For each galaxy type, we show images in the 
F200W (upper row) and F356W (bottom row) bands. 
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Figure B3. Examples of double nucleus or interacting galaxy images extracted from the CEERS dataset in the F200W (upper 
row) and F356W (bottom row) filters. Galaxies are selected in the UMAP plane as follows: 1 £ UMAP 1 < 2 and 0 S$ UMAP 


2 <1. See Figure 10 and subsection 4.1 for more details. 
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Figure B4. Examples of double nucleus or interacting galaxy images extracted from the VISUAL dataset in the F200W (upper 


row) and F356W (bottom row) filters. Galaxies are selected in the UMAP plane as follows: 1 £ UMAP 1 < 2 and 0 S$ UMAP 
2 S; 1. See Figure 10, subsection 4.2 and subsubsection 5.2.4 for more details. 


